Optimization of Svm Parameters for Promoter Recognition in Dna Sequences
نویسنده
چکیده
Recognition of specific functionally-important DNA sequence fragments is considered one of the most important problems in bioinformatics. One type of such fragments are promoters, i.e., short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. In this paper, Support Vector Machine (SVM) is used for classification of DNA sequences and promoter recognition. For optimal classification, various SVM learning and kernel parameters (hyperparameters) and their optimization methods are analyzed. In a case study, the optimization of the SVM hyperparameters for linear, polynomial and power series kernels is performed using a modification of the Nelder-Mead (downhill simplex) algorithm. The method allows for improving the precision of identification of the promoter sequences. The results of classification for a drosophila sequence dataset are presented.
منابع مشابه
In silico screening of G-Quadruplex Structures in Wilms tumor 1 Gene Promoter
Introduction: X-ray diffraction studies have revealed that guanines in a DNA stands may be arranged in quartet and form a structure called G-quadruplexs. Bioinformatics studies suggested the formation of G-quadruplex structure in human crucial genes, including Wilms tumor 1 (WT1). The aim of this study was to in silico analysis of the guanine-rich sequence in the promoter region of the WT1 gene...
متن کاملFinding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM
Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...
متن کاملComparison of Promoter Sequences of Flowering Control Genes, FT1 and Three Versions of VIN3, in Susceptible and Resistant Sugar Beet Genotypes to Bolting
Autumn sowing of sugar beet is a suitable way in sustainable agriculture. Bolting is an undesirable phenomenon which reduces sugar beet yield and it is the most important limiting factor in autumn sowing of sugar beet. Identification and comparison of the sequence of flowering genes in various genotypes can help to understand the molecular mechanisms controlling bolting. In the previous studies...
متن کاملMolecular and Bioinformatics Analysis of Allelic Diversity in IGFBP2 Gene Promoter in Indigenous Makuee and Lori-Bakhtiari Sheep Breeds
The aim of this study was to perform molecular and bioinformatics analysis of IGFBP2 gene promoter in association with some economic traits in indigenous Makuee (MS) and Lori-Bakhtiari (LB) breeds. DNA was extracted from blood samples of 120 MS and 200 LB and a 297 bp fragment from the upstream sequences of studied gene was amplified and genotyped by single-strand conformational polymo...
متن کاملDerivation of Context-free Stochastic L-grammar Rules for Promoter Sequence Modeling Using Support Vector Machine
Formal grammars can used for describing complex repeatable structures such as DNA sequences. In this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar. L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant development, and model the morphology of a variety of organisms. We believe that...
متن کامل